Broad Phonetic Classes for Speaker Verification with Noisy, Large-Scale Data

نویسندگان

  • Howard Lei
  • Nikki Mirghafori
چکیده

While the incorporation of phonetic information has contributed to speaker verification improvements for lexically unconstrained speech in the past, improvements have not been widely observed using the state-of-the-art i-vector system, which typically performs best using a "bag-of-frames" approach. This work explores ways to incorporate Broad Phonetic Class (BPC) information for the i-vector system with noisy speech data that is not lexically constrained. Different approaches for combining the BPCs have been examined. Results suggest that, through parallelization and combination strategies, BPCs may contribute to roughly a 13% improvement over an i-vector baseline system. However, confounding factors such as increased parameter size, use of noise-generated speech data, and the advantage of combination strategies are potential caveats to attributing the improvement to the discriminating power of BPCs alone. This work was funded by Air Force Research Laboratory (AFRL) award FA8750-12-1-0016. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the view of the AFRL.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speaker verification based on broad phonetic categories

In this work we present a speaker verification system based on 4 broad phonetic categories: vowels+diphthongs, fricatives, glides+nasals, and silence+stops. Using these categories separately, it is observed that vowels, diphthongs, and fricatives are the most important categories for speaker verification. This observation confirms the results from the analysis of speaker and channel variability...

متن کامل

Phonetic Class-based Spea

Phonetic Class-based Speaker Verification (PCBV) is a natural refinement of the traditional single Gaussian Mixture Model (Single GMM) scheme. The aim is to accurately model the voice characteristics of a user on a per-phonetic class basis. The paper describes briefly the implementation of a representation of the voice characteristics in a hierarchy of phonetic classes. We present a framework t...

متن کامل

Phonetic Speaker Recognition

The aim of this study is to answer two questions regarding the use of phonetic information for speaker modelling. We formulate answers for (1) what are the discriminative powers of broad phonetic classes for the task of speaker identification? (2) Are the phonetic speaker models more suitable for speaker recognition than standard models?

متن کامل

Mixture of Auto-Associative Neural Networks for Speaker Verification

The paper introduces a mixture of auto-associative neural networks for speaker verification. A new objective function based on posterior probabilities of phoneme classes is used for training the mixture. This objective function allows each component of the mixture to model part of the acoustic space corresponding to a broad phonetic class. This paper also proposes how factor analysis can be app...

متن کامل

Improving Speaker Recognition Performance Using Phonetically Structured Gaussian Mixture Models

Throughout the past few years it has been shown that Gaussian Mixture Models (GMM) are highly suitable for speaker identification and verification. Nevertheless these models try to represent primarily the distribution of the available training data neglecting any possible phonetic information which might be of worth. In our paper we present a recognition system using multiple speaker GMMs based...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014